|Poster:||randomo||Date:||Oct 12, 2018 5:54am|
|Forum:||texts||Subject:||Recurrent typos in Lord of the Rings|
I was looking through the Archive's full-text file for The Lord of the Rings: The Return of the King (https://archive.org/stream/TheLordOfTheRing1TheFellowshipOfTheRing/The%20Return%20Of%20The%20King_djvu.txt) and noticed a recurring typo. In many places where there should have been an E, there was a J instead. For example, the character named Eowyn was repeatedly shown as Jowyn.
How can I report this to someone who can fix the text file? Thanks.
|Poster:||christ_chan64||Date:||Oct 23, 2018 5:14am|
|Forum:||texts||Subject:||Re: Recurrent typos in Lord of the Rings|
Basically, what happens is that when a PDF is uploaded, page images are extracted from the PDF, and then those images are read by an optical character recognition (OCR) software called ABBYY, and the readout is saved as a file named filename_abbyy.gz. Every other format is ultimately derived from this OCR readout, including the _djvu.txt file.
Due to the imperfect nature of OCR technology, the readout is rarely 100% accurate and errors will creep in.
I doubt Archive employees have any time to be correcting errors in OCR-ed text. If it was your item, I would suggest perhaps downloading the text file, correcting the errors and reuploading it with the same filename. (Not sure if the Archive software would overwrite the changes, though.).
Alternatively, you could make the corrections yourself and upload it as a separate item.